Skip to content

feat: added dynamic sitemap generation using sitemap.ts in nextjs - 🏗️#353

Closed
amaan-bhati wants to merge 7 commits intomainfrom
dynamic-sitemap
Closed

feat: added dynamic sitemap generation using sitemap.ts in nextjs - 🏗️#353
amaan-bhati wants to merge 7 commits intomainfrom
dynamic-sitemap

Conversation

@amaan-bhati
Copy link
Copy Markdown
Member

@amaan-bhati amaan-bhati commented Mar 30, 2026

Related Tickets & Documents

Fixes: #[issue-number] none, didnt create an issue for the same since id be picking this up

Description

Replaced the legacy, manually-maintained public/sitemap.xml with a fully automated, dynamic sitemap generator located at app/sitemap.ts. It leverages the Next.js App Router Metadata API and WPGraphQL to ensure that every new post, category, tag, and author published on WordPress is instantly and accurately indexed for SEO without manual intervention.

Key Features and Automation

The script programmatically generates a sitemap at keploy.io/blog/sitemap.xml by fetching and mapping the following entities:

  • Posts: All published articles (paginated).
  • Categories: Dynamic category archive pages (e.g., /technology, /community).
  • Tags: Dynamic tag archive pages (e.g., /tag/automation).
  • Authors: Author profile pages (dynamic path/authors/[slug]).

Technical Implementation & Best Practices

  • Next.js Metadata API: Utilizes the native sitemap.ts convention, which automatically handles XML serialization and Content-Type headers.
  • Optimized GraphQL Querying: To prevent server-side overhead, we use Reverse Connection Mapping. Instead of calculating lastmod in JavaScript, we fetch the most recent post's modification date directly within the Taxonomy/Author GraphQL query:
    tags { nodes { posts(first: 1) { nodes { modified } } } }
  • Data Caching: Implements a 24-hour cache (revalidate: 86400) to ensure fast response times while maintaining a daily update cycle aligned with our posting schedule.
  • Removed the existing sitemap as suggested by copilot after review since it might overlap the sitemap generated by the script.
  • Priority Aggregation:
    • Recent Posts (< 30 days old): Priority 0.8
    • Archive/Old Posts: Priority 0.5
    • Main Aggregators (Categories/Home): Priority 1.0

When does it run?

The timing depends on your environment:

  • During next build: When you deploy your site, Next.js runs this script once. It fetches all 1,500+ URLs and bundles them so the sitemap is ready the second the site goes live.
  • Every 24 Hours (Automatic Update): In production, the script is set to revalidate every 86,400 seconds (24 hours). When Google's bot (or a user) visits /blog/sitemap.xml after this period, Next.js quietly re-runs the script in the background to fetch new posts from WordPress and updates the XML file on the fly.
  • In Development (npm run dev): It runs every single time you refresh the /blog/sitemap.xml page in your browser so you can see your changes instantly.

Performance & Edge Cases Covered

  • WP Engine 502 Bad Gateway (Rate Limiting):
    • The Issue: Initial attempts used Promise.all to fetch Posts, Tags, Authors, and Categories concurrently. This triggered WPEngine's burst-limit protection, causing intermittent 502 errors.
    • The Fix: The final version switched to Sequential Step-Fetching. By awaiting each entity group one-by-one, we respect narrow API rate-limit windows without sacrificing significantly on build-time speed.
  • URL Deduplication: WordPress often returns the same post in multiple categories. The script uses a Set to track added URLs, ensuring each article has exactly one canonical entry based on its primary category.
  • Data Sanitization (404 Prevention):
    • The Issue: Posts categorized as "Uncategorized" would generate /blog/uncategorized/[slug], which doesn't exist in our frontend routing.
    • The Fix: The generator now skips any post that is uncategorized or missing a valid slug. This ensures Google never crawls a "broken" URL generated by the CMS defaults.

Important edge case (multiple ai agents fell short here lol)

  • Multiple ai agents suggested while reviewing: the uncategorized catgories should be given a default fallback to /technology otherwise the page would return 404 because there is no ui for uncategorised pages in our code.
  • Tried implementing this but eventaully came accross a thought that if we replace the /unrecognised slugs and use /technology, we'd still get 404 because the url would eventually break
  • Then i tried checking what exactly does the uncategorised mean, these are when a blog's data comes from a route other than /technology and /community which is technically not possible because behind the scenes our blogs data comes from wordpress and hashnode, on hashnode we only have two categories, one for tech blogs and other for community blogs, data stored is at these two repositries (https://github.com/keploy/tech-blog) (https://github.com/keploy/community-blog), there is no possible case of uncategorised here
  • I also confirmed this by running the following script:
    curl -s -X POST -H "Content-Type: application/json" \ -d '{"query": "{ categories(first: 100) { edges { node { slug count } } } }"}' \ https://wp.keploy.io/graphql | jq '.data.categories.edges[].node'

which confirmed the following:
Screenshot 2026-03-30 at 7 25 48 PM

References and source of truths:

Type of Change

  • Chore (maintenance, refactoring, tooling updates)
  • Bug fix (non-breaking change that fixes an issue)
  • New feature (change that adds functionality)
  • Breaking Change (may require updates in existing code)
  • UI improvement (visual or design changes)
  • Performance improvement (optimization or efficiency enhancements)
  • Documentation update (changes to README, guides, etc.)
  • CI (updates to continuous integration workflows)
  • Revert (undo a previous commit or merge)

Environment and Dependencies

  • New Dependencies:
  • Configuration Changes:

Checklist

  • My code follows the style guidelines of this project
  • I have performed a self-review of my own code
  • I have made corresponding changes to the documentation
  • I have added corresponding tests
  • I have run the build command to ensure there are no build errors
  • My changes have been tested across relevant browsers/devices
  • For UI changes, I've included visual evidence of my changes

Signed-off-by: amaan-bhati <amaanbhati49@gmail.com>
Signed-off-by: amaan-bhati <amaanbhati49@gmail.com>
Copilot AI review requested due to automatic review settings March 30, 2026 10:59
Comment thread app/sitemap.ts Outdated
Comment thread app/sitemap.ts
Comment thread app/sitemap.ts
@kilo-code-bot
Copy link
Copy Markdown

kilo-code-bot Bot commented Mar 30, 2026

Code Review Summary

Status: 4 Issues Found | Recommendation: Address before merge

Overview

Severity Count
CRITICAL 2
WARNING 1
SUGGESTION 0
Issue Details (click to expand)

CRITICAL

File Line Issue
app/sitemap.ts 27 Error log console.error("fetchGraphQL res not ok:", ...) lacks actionable next steps - should guide users on what to investigate (e.g., "Check WP_API_URL configuration and verify the WordPress GraphQL endpoint is reachable")
app/sitemap.ts 33 Error log console.error('WPGraphQL fetch error:', error) lacks actionable next steps - should guide users on what to investigate (e.g., "Check network connectivity and verify WP_API_URL is correct")

WARNING

File Line Issue
app/sitemap.ts 213 No validation of post.modified before parsing as Date - could create Invalid Date objects if field is missing or malformed
Resolved Issues (fixed in commit a9e70f6)
  • DEBUG log gating - All console.log("DEBUG: ...") statements in fetchAllTaxonomies have been properly gated with process.env.NODE_ENV === 'development' and changed to console.debug() (lines 100-114)
  • public/sitemap.xml was deleted - resolves the invalid XML header issue
  • Conflict between static public/sitemap.xml and dynamic app/sitemap.ts - resolved by deletion of static file
Other Observations (not blocking)
  • baseUrl is hard-coded to 'https://keploy.io/blog' - consider using a centralized config if one exists in the codebase
  • The code assumes res.ok implies a successful GraphQL response, but GraphQL can return { errors: [...] } without a data field - consider checking json.errors before returning
Files Reviewed (2 files)
  • app/sitemap.ts - 3 remaining issues (dynamic sitemap generation)
  • public/sitemap.xml - Deleted (good - resolves previous issues)

Fix these issues in Kilo Cloud


Reviewed by claude-4.5-opus-20251124 · 214,435 tokens

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a Next.js App Router sitemap endpoint and related TypeScript config updates to support dynamic sitemap generation from the WordPress GraphQL backend.

Changes:

  • Added app/sitemap.ts implementing MetadataRoute.Sitemap with static + WP-derived dynamic entries.
  • Updated tsconfig.json (formatting + Next TS plugin + .next/types include + enabled strictNullChecks).
  • Updated public/sitemap.xml contents (but this now conflicts with the dynamic sitemap route).

Reviewed changes

Copilot reviewed 1 out of 3 changed files in this pull request and generated 6 comments.

File Description
tsconfig.json Adjusts TS config for Next.js tooling and enables strictNullChecks.
public/sitemap.xml Updates the static sitemap file, but introduces invalid content and a route conflict with the new dynamic sitemap.
app/sitemap.ts Implements dynamic sitemap generation by paginating WordPress posts via GraphQL and emitting MetadataRoute.Sitemap.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread app/sitemap.ts Outdated
Comment thread public/sitemap.xml Outdated
Comment thread public/sitemap.xml Outdated
Comment thread app/sitemap.ts Outdated
Comment thread app/sitemap.ts Outdated
Comment thread app/sitemap.ts Outdated
…d authors from wpgraphql

Signed-off-by: amaan-bhati <amaanbhati49@gmail.com>
Comment thread app/sitemap.ts
Comment thread app/sitemap.ts Outdated
amaan-bhati and others added 3 commits March 30, 2026 18:23
… bad gateway edge case

Signed-off-by: amaan-bhati <amaanbhati49@gmail.com>
Signed-off-by: Amaan Bhati <94218318+amaan-bhati@users.noreply.github.com>
Comment thread app/sitemap.ts Outdated
`
const data = await fetchGraphQL<any>(query, { after: after || null })
if (!data?.[type]) {
console.log(`DEBUG: fetchGraphQL returned missing data for ${type}`)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[WARNING]: Debug logs use console.log instead of console.debug.

Per the custom instruction "If debug log statements have functions that can be expensive in non-debug mode (function still executing even if debug mode is not enabled), ensure to gate them" — these console.log calls with template literal interpolation execute in production, incurring unnecessary overhead.

Consider changing to console.debug(...) or gating with a debug flag:

Suggested change
console.log(`DEBUG: fetchGraphQL returned missing data for ${type}`)
if (process.env.NODE_ENV === 'development') console.debug(`fetchGraphQL returned missing data for ${type}`)

Comment thread app/sitemap.ts Outdated
break
}

console.log(`DEBUG: Fetched ${type} page with ${data[type].edges.length} items. hasNextPage: ${data[type].pageInfo.hasNextPage}, endCursor: ${data[type].pageInfo.endCursor}`)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[WARNING]: Debug log executes in production.

This console.log labeled as DEBUG should use console.debug to avoid production log noise and unnecessary string interpolation overhead. Per custom instruction #3: "Info logs should be really relevant info to the user; if not, they should be debug."

Suggested change
console.log(`DEBUG: Fetched ${type} page with ${data[type].edges.length} items. hasNextPage: ${data[type].pageInfo.hasNextPage}, endCursor: ${data[type].pageInfo.endCursor}`)
if (process.env.NODE_ENV === 'development') console.debug(`Fetched ${type} page with ${data[type].edges.length} items. hasNextPage: ${data[type].pageInfo.hasNextPage}, endCursor: ${data[type].pageInfo.endCursor}`)

Comment thread app/sitemap.ts Outdated

// Failsafe to prevent excessive polling
if (allNodes.length > 5000) {
console.log(`DEBUG: Failsafe triggered for ${type}`)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[WARNING]: Debug log should be gated.

Same issue as above — this DEBUG-prefixed log uses console.log and executes in production.

Suggested change
console.log(`DEBUG: Failsafe triggered for ${type}`)
if (process.env.NODE_ENV === 'development') console.debug(`Failsafe triggered for ${type}`)

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 2 out of 3 changed files in this pull request and generated 7 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread app/sitemap.ts
Comment on lines +126 to +130
export default async function sitemap(): Promise<MetadataRoute.Sitemap> {
const baseUrl = 'https://keploy.io/blog'

// Sequential fetching to deeply respect WP Engine GraphQL burst limits
const posts = await fetchAllPosts()
Copy link

Copilot AI Mar 30, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new sitemap endpoint is SEO-critical but isn’t covered by existing e2e tests. Consider adding a Playwright test that requests /blog/sitemap.xml, asserts a 200 + application/xml content-type, and verifies it contains a few expected URLs.

Copilot uses AI. Check for mistakes.
Comment thread app/sitemap.ts
Comment on lines +26 to +34
if (!res.ok) {
console.error("fetchGraphQL res not ok:", res.status, res.statusText)
return null
}
const json = await res.json()
return json.data as T
} catch (error) {
console.error('WPGraphQL fetch error:', error)
return null
Copy link

Copilot AI Mar 30, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These error logs don’t provide a clear next step, and they’ll appear in production if WPGraphQL is transiently unavailable (e.g., rate limiting). Consider improving the message to include actionable guidance (e.g., check WPGraphQL endpoint/env vars, retry later) and/or handling retries/backoff; also consider avoiding logging full errors repeatedly during ISR to reduce log noise.

Copilot generated this review using guidance from organization custom instructions.
Comment thread app/sitemap.ts
Comment thread app/sitemap.ts
Comment on lines +27 to +33
console.error("fetchGraphQL res not ok:", res.status, res.statusText)
return null
}
const json = await res.json()
return json.data as T
} catch (error) {
console.error('WPGraphQL fetch error:', error)
Copy link

Copilot AI Mar 30, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fetchGraphQL ignores GraphQL-level errors (json.errors) and will return json.data even when the response contains errors or partial data. That can silently generate an incomplete/incorrect sitemap; consider checking json.errors (similar to lib/api.ts:25-28) and failing fast or returning null with an actionable error message.

Suggested change
console.error("fetchGraphQL res not ok:", res.status, res.statusText)
return null
}
const json = await res.json()
return json.data as T
} catch (error) {
console.error('WPGraphQL fetch error:', error)
console.error(
'fetchGraphQL HTTP error:',
res.status,
res.statusText,
'- please verify the WPGraphQL endpoint URL and check server logs for more details.'
)
return null
}
const json = await res.json()
if (json && Array.isArray(json.errors) && json.errors.length > 0) {
console.error(
'fetchGraphQL GraphQL error(s) returned from WPGraphQL. Inspect the query and server logs for details:',
json.errors
)
return null
}
return json.data as T
} catch (error) {
console.error(
'WPGraphQL fetch error. Please check network connectivity, the WPGraphQL endpoint configuration, and server logs:',
error
)

Copilot uses AI. Check for mistakes.
Comment thread app/sitemap.ts
Comment on lines +99 to +110
if (!data?.[type]) {
console.log(`DEBUG: fetchGraphQL returned missing data for ${type}`)
break
}

console.log(`DEBUG: Fetched ${type} page with ${data[type].edges.length} items. hasNextPage: ${data[type].pageInfo.hasNextPage}, endCursor: ${data[type].pageInfo.endCursor}`)

// Failsafe to prevent excessive polling
if (allNodes.length > 5000) {
console.log(`DEBUG: Failsafe triggered for ${type}`)
break
}
Copy link

Copilot AI Mar 30, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are unconditional console.log “DEBUG:” statements in the sitemap generator. This will produce noisy logs during builds/ISR and doesn’t follow the logging guideline here (debug logs should be gated, and error logs should include a clear next step). Please remove these logs or gate them behind an explicit debug flag (e.g., an env var) and keep production output minimal/actionable.

Copilot generated this review using guidance from organization custom instructions.
Comment thread app/sitemap.ts
Comment on lines +158 to +165
tags.forEach(tag => {
sitemapData.push({
url: `${baseUrl}/tag/${encodeURIComponent(tag.slug)}`,
lastModified: tag.lastModified,
changeFrequency: 'weekly',
priority: 0.64,
})
})
Copy link

Copilot AI Mar 30, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sitemap tag URLs are built from tag.slug, but the site’s tag routes/links appear to use the tag name (e.g. components/tag.tsx links to /tag/${name}, and the previous sitemap contained URLs like /tag/Feature%20Flags). Using slug here may create URLs that are inconsistent with internal linking (and can cause duplicate/incorrect indexing). Consider generating tag URLs using the same value the route expects (or update routing to consistently use slugs).

Copilot uses AI. Check for mistakes.
Comment thread app/sitemap.ts
Comment on lines +177 to +185
// Process Authors (verified exact path: /blog/authors/[slug])
authors.forEach(author => {
sitemapData.push({
url: `${baseUrl}/authors/${encodeURIComponent(author.slug)}`,
lastModified: author.lastModified,
changeFrequency: 'weekly',
priority: 0.64,
})
})
Copy link

Copilot AI Mar 30, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Author URLs in the sitemap are built from WP user slug, but the site’s author routing uses sanitizeAuthorSlug(ppmaAuthorName) (see components/AuthorMapping.tsx / pages/authors/[slug].tsx). This mismatch can yield non-canonical or even 404 author URLs in the sitemap. Consider deriving author slugs using sanitizeAuthorSlug (e.g., by aggregating author names from posts) or by querying name and sanitizing it instead of using WP slug directly.

Copilot uses AI. Check for mistakes.
…pilot review

Updated debug logging to use console.debug in development mode.

Signed-off-by: Amaan Bhati <94218318+amaan-bhati@users.noreply.github.com>
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 2 out of 3 changed files in this pull request and generated 5 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread app/sitemap.ts
Comment on lines +183 to +190
// Process Authors (verified exact path: /blog/authors/[slug])
authors.forEach(author => {
sitemapData.push({
url: `${baseUrl}/authors/${encodeURIComponent(author.slug)}`,
lastModified: author.lastModified,
changeFrequency: 'weekly',
priority: 0.64,
})
Copy link

Copilot AI Mar 31, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Author sitemap entries are built from WPGraphQL users.node.slug (/authors/${slug}), but the existing author route (pages/authors/[slug].tsx) generates/looks up slugs via sanitizeAuthorSlug(ppmaAuthorName) derived from posts. These formats likely won’t match, leading to sitemap URLs that 404. Consider deriving author slugs the same way the route does (from ppmaAuthorName) or reusing sanitizeAuthorSlug when building author URLs.

Copilot uses AI. Check for mistakes.
Comment thread app/sitemap.ts
Comment on lines +86 to +92
slug
posts(first: 1) {
nodes {
modified
}
}
}
Copy link

Copilot AI Mar 31, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fetchAllTaxonomies uses posts(first: 1) without an explicit orderby. This makes lastModified potentially incorrect (often returns most recent by DATE, not by MODIFIED), which undermines the goal of accurate <lastmod> values. Add an explicit order (e.g., order by MODIFIED DESC) to ensure the returned modified is actually the latest for that taxonomy/author.

Copilot uses AI. Check for mistakes.
Comment thread app/sitemap.ts
Comment on lines +173 to +180
// Process Categories
categories.forEach(cat => {
sitemapData.push({
url: `${baseUrl}/${encodeURIComponent(cat.slug)}`,
lastModified: cat.lastModified,
changeFrequency: 'weekly',
priority: 0.80,
})
Copy link

Copilot AI Mar 31, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The sitemap currently emits URLs for all WP categories (${baseUrl}/${cat.slug}), but the Next.js routes in this repo only support /technology and /community (no generic category route). If WP ever contains other categories with posts, this will publish invalid URLs to crawlers. Consider filtering categories (and post primaryCategory) to the set of routes the frontend actually serves, or mapping WP categories to supported route segments.

Copilot uses AI. Check for mistakes.
Comment thread app/sitemap.ts
Comment thread app/sitemap.ts
Comment on lines +132 to +140
export default async function sitemap(): Promise<MetadataRoute.Sitemap> {
const baseUrl = 'https://keploy.io/blog'

// Sequential fetching to deeply respect WP Engine GraphQL burst limits
const posts = await fetchAllPosts()
const tags = await fetchAllTaxonomies('tags')
const categories = await fetchAllTaxonomies('categories')
const authors = await fetchAllTaxonomies('users')

Copy link

Copilot AI Mar 31, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are Playwright E2E tests for SEO, but none cover /blog/sitemap.xml. Adding a basic test that requests the sitemap and asserts it returns XML with a few expected URLs (and no obvious 404 targets) would help prevent regressions in sitemap generation and routing alignment.

Copilot uses AI. Check for mistakes.
@amaan-bhati
Copy link
Copy Markdown
Member Author

clsoing this pr since we found a cheaper and more optimised + faster approach in this pr: #374

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants